NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Resource Efficient Bayesian Optimization

https://doi.org/10.1109/CLOUD62652.2024.00012

Juneja, Namit; Chandola, Varun; Zola, Jaroslaw; Wodo, Olga; Desai, Parth (July 2024, IEEE)

Full Text Available
Tracking clusters and anomalies in evolving data streams

https://doi.org/10.1002/sam.11552

Guggilam, Sreelekha; Chandola, Varun; Patra, Abani (April 2022, Statistical Analysis and Data Mining: The ASA Data Science Journal)

Full Text Available
Mutual Information Scoring: Increasing Interpretability in Categorical Clustering Tasks with Applications to Child Welfare Data

Sankhe, Pranav; Hall, Seventy F.; Sage, Melanie; Rodriguez, Maria Y.; Chandola, Varun; Joseph, Kenneth (September 2022, SBP-BRiMS 2022)

Full Text Available
Graph-based Strategy for Establishing Morphology Similarity

https://doi.org/10.1145/3468791.3468819

Juneja, Namit; Zola, Jaroslaw; Chandola, Varun; Wodo, Olga (July 2021, International Conference on Scientific and Statistical Database Management (SSDBM))

Full Text Available
Multi-step ahead predictive model for blood glucose concentrations of type-1 diabetic patients

https://doi.org/10.1038/s41598-021-03341-5

Zaidi, Syed Mohammed; Chandola, Varun; Ibrahim, Muhanned; Romanski, Bianca; Mastrandrea, Lucy D.; Singh, Tarunraj (December 2021, Scientific Reports)

Abstract Continuous monitoring of blood glucose (BG) levels is a key aspect of diabetes management. Patients with Type-1 diabetes (T1D) require an effective tool to monitor these levels in order to make appropriate decisions regarding insulin administration and food intake to keep BG levels in target range. Effectively and accurately predicting future BG levels at multi-time steps ahead benefits a patient with diabetes by helping them decrease the risks of extremes in BG including hypo- and hyperglycemia. In this study, we present a novel multi-component deep learning model that predicts the BG levels in a multi-step look ahead fashion. The model is evaluated both quantitatively and qualitatively on actual blood glucose data for 97 patients. For the prediction horizon (PH) of 30 mins, the average values for root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and normalized mean squared error (NRMSE) are $$23.22 \pm 6.39$$ 23.22 ± 6.39 mg/dL, 16.77 ± 4.87 mg/dL, $$12.84 \pm 3.68$$ 12.84 ± 3.68 and $$0.08 \pm 0.01$$ 0.08 ± 0.01 respectively. When Clarke and Parkes error grid analyses were performed comparing predicted BG with actual BG, the results showed average percentage of points in Zone A of $$80.17 \pm 9.20$$ 80.17 ± 9.20 and $$84.81 \pm 6.11,$$ 84.81 ± 6.11 , respectively. We offer this tool as a mechanism to enhance the predictive capabilities of algorithms for patients with T1D.
more » « less
Full Text Available
Learning Manifolds from Dynamic Process Data

https://doi.org/10.3390/a13020030

Schoeneman, Frank; Chandola, Varun; Napp, Nils; Wodo, Olga; Zola, Jaroslaw (February 2020, Algorithms)

Scientific data, generated by computational models or from experiments, are typically results of nonlinear interactions among several latent processes. Such datasets are typically high-dimensional and exhibit strong temporal correlations. Better understanding of the underlying processes requires mapping such data to a low-dimensional manifold where the dynamics of the latent processes are evident. While nonlinear spectral dimensionality reduction methods, e.g., Isomap, and their scalable variants, are conceptually fit candidates for obtaining such a mapping, the presence of the strong temporal correlation in the data can significantly impact these methods. In this paper, we first show why such methods fail when dealing with dynamic process data. A novel method, Entropy-Isomap, is proposed to handle this shortcoming. We demonstrate the effectiveness of the proposed method in the context of understanding the fabrication process of organic materials. The resulting low-dimensional representation correctly characterizes the process control variables and allows for informative visualization of the material morphology evolution.
more » « less
Full Text Available
Query Log Compression for Workload Analytics

https://doi.org/https://doi.org/10.14778/3291264.3291265

Xie, Ting; Chandola, Varun; Kennedy, Oliver (November 2018, Proceedings of the VLDB Endowment)

Analyzing database access logs is a key part of performance tuning, intrusion detection, benchmark development, and many other database administration tasks. Unfortunately, it is common for production databases to deal with millions or more queries each day, so these logs must be summarized before they can be used. Designing an appropriate summary encoding requires trading off between conciseness and information content. For example: simple workload sampling may miss rare, but high impact queries. In this paper, we present LogR, a lossy log compression scheme suitable for use in many automated log analytics tools, as well as for human inspection. We formalize and analyze the space/fidelity trade-off in the context of a broader family of “pattern” and “pattern mixture” log encodings to which LogR belongs. We show through a series of experiments that LogR compressed encodings can be created efficiently, come with provable information-theoretic bounds on their accuracy, and outperform state-of-art log summarization strategies.
more » « less
Full Text Available
S-Isomap++: Multi manifold learning from streaming data

https://doi.org/10.1109/BigData.2017.8257987

Mahapatra, Suchismit; Chandola, Varun (December 2017, IEEE Bigdata 2017)

Manifold learning based methods have been widely used for non-linear dimensionality reduction (NLDR). However, in many practical settings, the need to process streaming data is a challenge for such methods, owing to the high computational complexity involved. Moreover, most methods operate under the assumption that the input data is sampled from a single manifold, embedded in a high dimensional space. We propose a method for streaming NLDR when the observed data is either sampled from multiple manifolds or irregularly sampled from a single manifold. We show that existing NLDR methods, such as Isomap, fail in such situations, primarily because they rely on smoothness and continuity of the underlying manifold, which is violated in the scenarios explored in this paper. However, the proposed algorithm is able to learn effectively in presence of multiple, and potentially intersecting, manifolds, while allowing for the input data to arrive as a massive stream.
more » « less
Full Text Available
Similarity Measures for SQL Query Clustering

https://doi.org/10.1109/TKDE.2018.2831214

Kul, Gokhan; Luong, Duc Thanh; Xie, Ting; Chandola, Varun; Kennedy, Oliver; Upadhyaya, Shambhu (July 2018, IEEE Transactions on Knowledge and Data Engineering)

Database access logs are the starting point for many forms of database administration, from database performance tuning, to security auditing, to benchmark design, and many more. Unfortunately, query logs are also large and unwieldy, and it can be difficult for an analyst to extract broad patterns from the set of queries found therein. Clustering is a natural first step towards understanding the massive query logs. However, many clustering methods rely on the notion of pairwise similarity, which is challenging to compute for SQL queries, especially when the underlying data and database schema is unavailable. We investigate the problem of computing similarity between queries, relying only on the query structure. We conduct a rigorous evaluation of three query similarity heuristics proposed in the literature applied to query clustering on multiple query log datasets, representing different types of query workloads. To improve the accuracy of the three heuristics, we propose a generic feature engineering strategy, using classical query rewrites to standardize query structure. The proposed strategy results in a significant improvement in the performance of all three similarity heuristics.
more » « less
Full Text Available
Ettu: Analyzing Query Intents in Corporate Databases

https://doi.org/10.1145/2872518.2888608

Kul, Gokhan; Luong, Duc; Xie, Ting; Coonan, Patrick; Chandola, Varun; Kennedy, Oliver; Upadhyaya, Shambhu (January 2016, Proceedings of the 25th International Conference Companion on World Wide Web)

Insider threats to databases in the financial sector have become a very serious and pervasive security problem. This paper proposes a framework to analyze access patterns to databases by clustering SQL queries issued to the database. Our system Ettu works by grouping queries with other similarly structured queries. The small number of intent groups that result can then be efficiently labeled by human operators. We show how our system is designed and how the components of the system work. Our preliminary results show that our system accurately models user intent.
more » « less
Full Text Available

Search for: All records